Goto

Collaborating Authors

 impossible language


Language models as tools for investigating the distinction between possible and impossible natural languages

Kallini, Julie, Potts, Christopher

arXiv.org Artificial Intelligence

December 5, 2025 Abstract We argue that language models (LMs) have strong potential as investigative tools for probing the distinction between possible and impossible natural languages and thus uncovering the inductive biases that support human language learning. We outline a phased research program in which LM architectures are iteratively refined to better discriminate between possible and impossible languages, supporting linking hypotheses to human cognition. Which conceivable linguistic systems are possible for humans to learn and use as natural languages? A complete answer to this question would yield profound insights into the human capacity for language. However, our tools for addressing the question are very limited.


Studies with impossible languages falsify LMs as models of human language

Bowers, Jeffrey S., Mitchell, Jeff

arXiv.org Artificial Intelligence

Studies with impossible languages falsify LMs as models of human language Jeffrey S. Bowers, School of Psychology and Neuroscience, University of Bristol Jeff Mitchell, School of Engineering and Informatics, University of Sussex Commentary on Futrell, R., & Mahowald, K. (in press). How linguistics learned to stop worrying and love the language models. Abstract According to Futrell and Mahowald (F&M), both infants and language models (LMs) find attested languages easier to learn than "impossible languages" that have unnatural structures. We review the literature and show that LMs often learn attested and many impossible languages equally well. Difficult to learn impossible languages are simply more complex (or random).


Biasless Language Models Learn Unnaturally: How LLMs Fail to Distinguish the Possible from the Impossible

Ziv, Imry, Lan, Nur, Chemla, Emmanuel, Katzir, Roni

arXiv.org Artificial Intelligence

Are large language models (LLMs) sensitive to the distinction between humanly possible languages and humanly impossible languages? This question is taken by many to bear on whether LLMs and humans share the same innate learning biases. Previous work has attempted to answer it in the positive by comparing LLM learning curves on existing language datasets and on "impossible" datasets derived from them via various perturbation functions. Using the same methodology, we examine this claim on a wider set of languages and impossible perturbations. We find that in most cases, GPT-2 learns each language and its impossible counterpart equally easily, in contrast to previous claims. We also apply a more lenient condition by testing whether GPT-2 provides any kind of separation between the whole set of natural languages and the whole set of impossible languages. By considering cross-linguistic variance in various metrics computed on the perplexity curves, we show that GPT-2 provides no systematic separation between the possible and the impossible. Taken together, these perspectives show that LLMs do not share the human innate biases that shape linguistic typology.


Large Language Model probabilities cannot distinguish between possible and impossible language

Leivada, Evelina, Montero, Raquel, Morosi, Paolo, Moskvina, Natalia, Serrano, Tamara, Aguilar, Marcel, Guenther, Fritz

arXiv.org Artificial Intelligence

A controversial test for Large Language Models concerns the ability to discern possible from impossible language. While some evidence attests to the models' sensitivity to what crosses the limits of grammatically impossible language, this evidence has been contested on the grounds of the soundness of the testing material. We use model-internal representations to tap directly into the way Large Language Models represent the 'grammatical-ungrammatical' distinction. In a novel benchmark, we elicit probabilities from 4 models and compute minimal-pair surprisal differences, juxtaposing probabilities assigned to grammatical sentences to probabilities assigned to (i) lower frequency grammatical sentences, (ii) ungrammatical sentences, (iii) semantically odd sentences, and (iv) pragmatically odd sentences. The prediction is that if string-probabilities can function as proxies for the limits of grammar, the ungrammatical condition will stand out among the conditions that involve linguistic violations, showing a spike in the surprisal rates. Our results do not reveal a unique surprisal signature for ungrammatical prompts, as the semantically and pragmatically odd conditions consistently show higher surprisal. We thus demonstrate that probabilities do not constitute reliable proxies for model-internal representations of syntactic knowledge. Consequently, claims about models being able to distinguish possible from impossible language need verification through a different methodology.


Anything Goes? A Crosslinguistic Study of (Im)possible Language Learning in LMs

Yang, Xiulin, Aoyama, Tatsuya, Yao, Yuekun, Wilcox, Ethan

arXiv.org Artificial Intelligence

Do LLMs offer insights into human language learning? A common argument against this idea is that because their architecture and training paradigm are so vastly different from humans, LLMs can learn arbitrary inputs as easily as natural languages. In this paper, we test this claim by training LMs to model impossible and typologically unattested languages. Unlike previous work, which has focused exclusively on English, we conduct experiments on 12 natural languages from 4 language families. Our results show that while GPT-2 small can primarily distinguish attested languages from their impossible counterparts, it does not achieve perfect separation between all the attested languages and all the impossible ones. We further test whether GPT-2 small distinguishes typologically attested from unattested languages with different NP orders by manipulating word order based on Greenberg's Universal 20. We find that the model's perplexity scores do not distinguish attested vs. unattested word orders, as long as the unattested variants maintain constituency structure. These findings suggest that language models exhibit some human-like inductive biases, though these biases are weaker than those found in human learners.


Kallini et al. (2024) do not compare impossible languages with constituency-based ones

Hunter, Tim

arXiv.org Artificial Intelligence

A central goal of linguistic theory is to find a precise characterization of the notion "possible human language", in the form of a computational device that is capable of describing all and only the languages that can be acquired by a typically developing human child. The success of recent large language models (LLMs) in NLP applications arguably raises the possibility that LLMs might be computational devices that meet this goal. This would only be the case if, in addition to succeeding in learning human languages, LLMs struggle to learn "impossible" human languages. Kallini et al. (2024; "Mission: Impossible Language Models", Proc. ACL) conducted experiments aiming to test this by training GPT-2 on a variety of synthetic languages, and found that it learns some more successfully than others. They present these asymmetries as support for the idea that LLMs' inductive biases align with what is regarded as "possible" for human languages, but the most significant comparison has a confound that makes this conclusion unwarranted. In this paper I explain the confound and suggest some ways forward towards constructing a comparison that appropriately tests the underlying issue.


No Such Thing as a General Learner: Language models and their dual optimization

Chemla, Emmanuel, Nefdt, Ryan M.

arXiv.org Artificial Intelligence

In section 4, we discuss the consequences of to this question, we first argue that neither this for the current field that is structured around humans nor LLMs are general learners, benchmarks mostly concerned with measures of the in a variety of senses. We make a novel final, trained states of LLMs. In section 5, we apply case for how in particular LLMs follow a our arguments to the evaluations more focused dual-optimization process: they are optimized on the learning stages of LLMs. One debate asks during their training (which is typically whether LLMs are not too powerful, often phrases compared to language acquisition), around the question as to whether'impossible' languages, and modern LLMs have also been selected, that allegedly cannot be learned by humans, through a process akin to natural selection can be learned by LLMs. We add to the debate in a species. From this perspective, the fact that, even when trained to learn possible we argue that the performance of LLMs, languages, parts of the languages that LLMs whether similar or dissimilar to that of humans, learn are indeed impossible. This shows that the does not weigh easily on important biases of LLMs are different from ours, and remind debates about the importance of human us that an adequate model of learning has to learn cognitive biases for language.


Mission: Impossible Language Models

Kallini, Julie, Papadimitriou, Isabel, Futrell, Richard, Mahowald, Kyle, Potts, Christopher

arXiv.org Artificial Intelligence

Chomsky and others have very directly claimed that large language models (LLMs) are equally capable of learning languages that are possible and impossible for humans to learn. However, there is very little published experimental evidence to support such a claim. Here, we develop a set of synthetic impossible languages of differing complexity, each designed by systematically altering English data with unnatural word orders and grammar rules. These languages lie on an impossibility continuum: at one end are languages that are inherently impossible, such as random and irreversible shuffles of English words, and on the other, languages that may not be intuitively impossible but are often considered so in linguistics, particularly those with rules based on counting word positions. We report on a wide range of evaluations to assess the capacity of GPT-2 small models to learn these uncontroversially impossible languages, and crucially, we perform these assessments at various stages throughout training to compare the learning process for each language. Our core finding is that GPT-2 struggles to learn impossible languages when compared to English as a control, challenging the core claim. More importantly, we hope our approach opens up a productive line of inquiry in which different LLM architectures are tested on a variety of impossible languages in an effort to learn more about how LLMs can be used as tools for these cognitive and typological investigations.